Modeling Lexical Cohesion for Document-Level Machine Translation
نویسندگان
چکیده
Lexical cohesion arises from a chain of lexical items that establish links between sentences in a text. In this paper we propose three different models to capture lexical cohesion for document-level machine translation: (a) a direct reward model where translation hypotheses are rewarded whenever lexical cohesion devices occur in them, (b) a conditional probability model where the appropriateness of using lexical cohesion devices is measured, and (c) a mutual information trigger model where a lexical cohesion relation is considered as a trigger pair and the strength of the association between the trigger and the triggered item is estimated by mutual information. We integrate the three models into hierarchical phrase-based machine translation and evaluate their effectiveness on the NIST Chinese-English translation tasks with large-scale training data. Experiment results show that all three models can achieve substantial improvements over the baseline and that the mutual information trigger model performs better than the others.
منابع مشابه
Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation
Lexical chains provide a representation of the lexical cohesion structure of a text. In this paper, we propose two lexical chain based cohesion models to incorporate lexical cohesion into document-level statistical machine translation: 1) a count cohesion model that rewards a hypothesis whenever a chain word occurs in the hypothesis, 2) and a probability cohesion model that further takes chain ...
متن کاملBilingual Lexical Cohesion Trigger Model for Document-Level Machine Translation
In this paper, we propose a bilingual lexical cohesion trigger model to capture lexical cohesion for document-level machine translation. We integrate the model into hierarchical phrase-based machine translation and achieve an absolute improvement of 0.85 BLEU points on average over the baseline on NIST Chinese-English test sets.
متن کاملDocument-Level Machine Translation Evaluation Metrics Enhanced with Simplified Lexical Chain
Document-level Machine Translation (MT) has been drawing more and more attention due to its potential of resolving sentencelevel ambiguities and inconsistencies with the benefit of wide-range context. However, the lack of simple yet effective evaluation metrics largely impedes the development of such document-level MT systems. This paper proposes to improve traditional MT evaluation metrics by ...
متن کاملDocument-Level Machine Translation Evaluation with Gist Consistency and Text Cohesion
Current Statistical Machine Translation (SMT) is significantly affected by Machine Translation (MT) evaluation metric. Nowadays the emergence of document-level MT research increases the demand for corresponding evaluation metric. This paper proposes two superior yet low-cost quantitative objective methods to enhance traditional MT metric by modeling document-level phenomena from the perspective...
متن کاملUsing Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation
We integrate newmechanisms in a document-level machine translation decoder to improve the lexical consistency of document translations. First, we develop a document-level feature designed to score the lexical consistency of a translation. This feature, which applies towords that have been translated into different forms within the document, uses word embeddings to measure the adequacy of each w...
متن کامل